Clustering by Local Skewering

نویسنده

  • David W. Scott
چکیده

Clustering p-dimensional data by fitting a mixture of K normals has enjoyed renewed interest (for example, see Splus function “mclust”). However, the number of parameters for the model grows rapidly with dimension p. For example, even if all the covariance matrices are assumed to be equal, the number of parameters is (K − 1)+K ∗ p+ p(p+1)/2 for the weights, means and covariance matrix. At ACAS in 2001, Scott introduced the partial mixture component algorithm which fits only one component of the mixture model at a time. This algorithm requires only 1 + p+ p ∗ (p+ 1)/2 parameters for the weight, mean vector, and covariance matrix. In this talk, we introduce a new algorithm which attempts to find the “best” line through individual clusters. This model requires only 2 ∗ p− 1 parameters. That is, the new algorithm is linear rather than quadratic in p. By repeatedly reinitializing the search algorithm, all clusters may be identified. Intuitively, the line found is approximately the largest eigenvector of the local covariance matrix. The GGobi visualization program will be used to illustrate the success of this algorithm on real and simulated data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Generating Optimal Timetabling for Lecturers using Hybrid Fuzzy and Clustering Algorithms

UCTTP is a NP-hard problem, which must be performed for each semester frequently. The major technique in the presented approach would be analyzing data to resolve uncertainties of lecturers’ preferences and constraints within a department in order to obtain a ranking for each lecturer based on their requirements within a department where it is attempted to increase their satisfaction and develo...

متن کامل

A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS

Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...

متن کامل

حاشیه‌نویسی تصویر با استفاده از الگوریتم خوشه‌بندی نیمه نظارتی طیفی

Abstract: Due to the growth of digital images require efficient methods to annotate the images is sense. In this paper, a semi-supervised spectral clustering with relevance feedback is used to annotate digital photos which is overcome the local minima problem on clustering methods by using some labeled information given by users. Performance of the proposed method is tested on Corel 5K dataset ...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005